Searching Recorded Speech Based on the Temporal Extent of Topic Labels

نویسندگان

Douglas W. Oard

Anton Leuski

Marina Del Rey

چکیده

Recorded speech poses unusual challenges for the design of interactive end-user search systems. Automatic speech recognition is sufficiently accurate to support the automated components of interactive search systems in some applications, but finding useful recordings among those nominated by the system can be difficult because listening to audio is time consuming and because recognition errors and speech disfluencies make it difficult to mitigate that effect by skimming automatic transcripts. Support for rapid browsing based on supervised learning for automatic classification has shown promise, however, and a segment-then-label framework has emerged as the dominant paradigm for applying that technique to news broadcasts. This paper argues for a more general framework, which we call an activation matrix, that provides a flexible representation for the mapping between labels and time. Three approaches to the generation of activation matrices are briefly described, with the main focus of the paper then being the use of activation matrices to support search and selection in interactive systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

معرفی شبکه های عصبی پیمانه ای عمیق با ساختار فضایی-زمانی دوگانه جهت بهبود بازشناسی گفتار پیوسته فارسی

In this article, growable deep modular neural networks for continuous speech recognition are introduced. These networks can be grown to implement the spatio-temporal information of the frame sequences at their input layer as well as their labels at the output layer at the same time. The trained neural network with such double spatio-temporal association structure can learn the phonetic sequence...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Effects of ageing on speed and temporal resolution of speech stimuli in older adults

Background: According to previous studies, most of the speech recognition disorders in older adults are the results of deficits in audibility and auditory temporal resolution. In this paper, the effect of ageing on timecompressed speech and auditory temporal resolution by word recognition in continuous and interrupted noise was studied. Methods: A time-compressed speech test (TCST) w...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Searching Recorded Speech Based on the Temporal Extent of Topic Labels

نویسندگان

چکیده

منابع مشابه

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

معرفی شبکه های عصبی پیمانه ای عمیق با ساختار فضایی-زمانی دوگانه جهت بهبود بازشناسی گفتار پیوسته فارسی

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Effects of ageing on speed and temporal resolution of speech stimuli in older adults

Speech Emotion Recognition Using Scalogram Based Deep Structure

عنوان ژورنال:

اشتراک گذاری